updated: 2022-05-03_12:33:36-04:00
Today's assignment is to look at the data, get comfortable with it.
- Load housing data (house_data.csv) from moodle into R.
- Explore the data
- Make sure to check if there are any NA’s
- Remove the columns – lat, long and view which do not contribute to the model.
- Take a subset of data if you can’t load everything may be 5000 rows
- Use the glimpse function to explore the data
- Get a summary of the data
- Plot a histogram of price distribution
- Plot a histogram of number of bedroom distribution
- Get a count of frequency of houses with the number of bedrooms. Basically how many houses with 1 bedroom, 2 bedrooms and so on.
- Get a box plot to see how the price and number of bedrooms are associated
- Get a box plot to see how the price and number of bathrooms are associated
- Plot price against squrefeet
- Plot price against number of bedrooms
- Create a linear regression model with all the variables
- Identify the significant variables.
- Explain the coefficients. Which variable has high impact
- Plot the correlations using ggcorrplot.